Scaling up neural networks has led to remarkable performance across a wide range of tasks. Moreover, performance often follows reliable scaling laws as a function of training set size, model size, and compute, which offers valuable guidance as large-scale experiments are becoming increasingly expensive. However, previous work on scaling laws has primarily used private data \& models or focused on uni-modal language or vision learning. To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository. Our large-scale experiments involve models trained on up to two billion image-text pairs and identify power law scaling for multiple downstream tasks including zero-shot classification, retrieval, linear probing, and end-to-end fine-tuning. We find that the training distribution plays a key role in scaling laws as the OpenAI and OpenCLIP models exhibit different scaling behavior despite identical model architectures and similar training recipes. We open-source our evaluation workflow and all models, including the largest public CLIP models, to ensure reproducibility and make scaling laws research more accessible. Source code and instructions to reproduce this study will be available at https://github.com/LAION-AI/scaling-laws-openclip
translated by 谷歌翻译
多模态语言视觉模型培训超过数亿图像文本对(例如剪辑,dall-e)获得了最近的浪涌,表明即使在没有每个的情况下也能够执行零或几秒钟学习和转移的显着能力目标图像数据上的示例标签。尽管存在这种趋势,迄今为止没有公开可公开的数据集足以从头划伤培训此类模型。为解决这个问题,在社区努力中,我们为公共LAION-400M构建和发布,一个具有剪辑的数据集 - 过滤400万图像文本对,其剪辑嵌入式和KNN指数允许有效的相似性搜索。
translated by 谷歌翻译
Increasing model, data and compute budget scale in the pre-training has been shown to strongly improve model generalization and transfer learning in vast line of work done in language modeling and natural image recognition. However, most studies on the positive effect of larger scale were done in scope of in-domain setting, with source and target data being in close proximity. To study effect of larger scale for both in-domain and out-of-domain setting when performing full and few-shot transfer, we combine here for the first time large, openly available medical X-Ray chest imaging datasets to reach a scale for medical imaging domain comparable to ImageNet-1k, routinely used for pre-training in natural image domain. We then conduct supervised pre-training, while varying network size and source data scale and domain, being either large natural (ImageNet-1k/21k) or large medical chest X-Ray datasets, and transfer pre-trained models to different natural or medical targets. We observe strong improvement due to larger pre-training scale for intra-domain natural-natural and medical-medical transfer. For inter-domain natural-medical transfer, we find improvements due to larger pre-training scale on larger X-Ray targets in full shot regime, while for smaller targets and for few-shot regime the improvement is not visible. Remarkably, large networks pre-trained on very large natural ImageNet-21k are as good or better than networks pre-trained on largest available medical X-Ray data when performing transfer to large X-Ray targets. We conclude that substantially increasing model and generic, medical domain-agnostic natural image source data scale in the pre-training can enable high quality out-of-domain transfer to medical domain specific targets, removing dependency on large medical domain-specific source data often not available in the practice.
translated by 谷歌翻译
为了测试一类深神经网络的泛化能力,我们基于John Conway的生活游戏,我们随机生成了2-D蜂窝自动机(CA)的大量不同规则集。使用这些规则,我们为每个CA实例计算多个轨迹。具有短路和长范围跳过连接的深度卷积编码器 - 解码器网络在各种生成的CA轨迹上培训,以预测给出其先前的州的下一个CA状态。结果表明,该网络能够学习各种,复杂的蜂窝自动机的规则,并概括到看不见的配置。在某种程度上,该网络显示统治集的概括和培训期间没有看到的邻域大小。重现实验的代码是公开可用的:https://github.com/slampai/一一化 - 细胞 - automata
translated by 谷歌翻译